impr(polyglot): Add new structure (wordsetMap) to PolyglotWordset to get words from any language by uniform distribution (@IliyaZinoviev)#7440
Conversation
2d5e9ad to
ed49d47
Compare
…t words from any language by uniform distribution (@IliyaZinoviev)
ed49d47 to
52be7ba
Compare
There was a problem hiding this comment.
Pull request overview
Adjusts polyglot word generation so languages are sampled (approximately) uniformly, rather than being dominated by languages with larger dictionaries.
Changes:
- Refactors
PolyglotWordsetto store per-languageWordsets (wordsMap) and pick a language uniformly per generated word. - Updates word generation logic to use
PolyglotWordset.currentLanguageinstead of per-word language lookups. - Small control-flow cleanup in word list retrieval / funbox wordset assignment.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| frontend/src/ts/test/words-generator.ts | Switches polyglot language detection to currentLanguage; minor cleanup around word list returns and funbox wordset assignment. |
| frontend/src/ts/test/funbox/funbox-functions.ts | Reworks PolyglotWordset internals to use wordsMap and uniform language selection. |
6d1d342 to
fcdb490
Compare
|
For now, I am not sure where to store |
- Add strView to not duplicate data. (@IliyaZinoviev)
fcdb490 to
f1461fd
Compare
|
Whats strView doing? Can you add some documentation? |
|
Is it possible to change the approach? The string view thing seems a bit over engineered in my opinion? |
…y string objects. (@IliyaZinoviev)
|
@IliyaZinoviev I made a small (kinda) change where now we connect the wordset word and language together, instead of grabbing it from the class - that way it should be impossible for them to get out of sync in anyway. What do you think? |
This comment was marked as outdated.
This comment was marked as outdated.
IliyaZinoviev
left a comment
There was a problem hiding this comment.
I am OK with your commit. But I am not sure that remove of currentLanguage get method makes code better. Maybe it's improvement of next iteration.
| this.wordsetMap = wordsetMap; | ||
| this.resetIndexes(); | ||
| this.length = words.length; | ||
| this.currLang = this.langs[0] as Language; |
There was a problem hiding this comment.
Via WordsetPick, mutable state of PolyglotWordset (currentLanguage) was exterminated, especially its incorrect value on PolyglotWordset initialization (this.currLang = this.langs[0] as Language;). But to achieve it, Wordset class was extended with WordsetPick which included polyglot-specific property (language).
I don't see case when this.currLang becomes unsynchronized.
this.currLang can be optional and then it can be checked on Language value in get currentLanguage() to exclude unexpected undefined value in client code.
| } | ||
|
|
||
| private uniformLang(): Language { | ||
| private pickLang(): Language { |
There was a problem hiding this comment.
Now, mnemonic name is gone so we need docs to highlight uniform distribution inside this method
| randomWord = randomWord.replace(/ +/gm, " "); | ||
| randomWord = randomWord.replace(/(^ )|( $)/gm, ""); | ||
| randomWord = applyLazyModeToWord(randomWord, currentLanguage); | ||
| randomWord = applyLazyModeToWord(randomWord, currentLanguage, pick.language); |
There was a problem hiding this comment.
Why is currentLanguage passed as param to applyLazyModeToWord, but currentWordset isn't? But both of them is kinda global in words-generator.ts:
let currentWordset: Wordset | null = null;
let currentLanguage: LanguageObject | null = null;
My idea is to update currentLanguage from words-generator.ts on each new word from currentWordset.
PS. It's just thoughts.
| function applyLazyModeToWord( | ||
| word: string, | ||
| language: LanguageObject, | ||
| polyglotLang?: Language, |
There was a problem hiding this comment.
If Wordset class's methods return WordsetPick then case theoretically exists when some child Wordset class returns Language. So maybe not to name it as polyglotLang at all, just wordsetLang or something?
|
I can offer to try to make I don't have any related to polyglot offers. |
Add uniform distribution to polyglot
Motivation
During polyglot using, I mentioned that words from language with huge dictionaries are met always, but words from small one - never (at least from a user experience perspective, it looks like this). My PR solves this issue. And now each language is consumed per test as much as any other.
Description
Changed:
Checks
packages/schemas/src/languages.tsfrontend/src/ts/constants/languages.tsfrontend/static/languagespackages/schemas/src/themes.tsfrontend/src/ts/constants/themes.tsfrontend/static/themespackages/schemas/src/layouts.tsfrontend/static/layoutsfrontend/static/webfontspackages/schemas/src/fonts.tsfrontend/src/ts/constants/fonts.tsCloses #